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Abstract 

We study strategy improvement algorithms for mean-payoff and parity games. We describe a 
structural property of these games, and we show that these structures can affect the behaviour of 
strategy improvement. We show how awareness of these structures can be used to accelerate strategy 
improvement algorithms. We call our algorithms non-oblivious because they remember properties 
of the game that they have discovered in previous iterations. We show that non-oblivious strategy 
improvement algorithms perform well on examples that are known to be hard for oblivious strategy 
improvement. Hence, we argue that previous strategy improvement algorithms fail because they 
ignore the structural properties of the game that they are solving. 

1 Introduction 

In this paper we study strategy improvement for two player infinite games played on finite graphs. In this 
setting the vertices of a graph are divided between two players. A token is placed on one of the vertices, 
and in each step the owner of the vertex upon which the token is placed must move the token along one 
of the outgoing edges of that vertex. In this fashion, the two players form an infinite path in the graph. 
The payoff of the game is then some property of this path, which depends on the type of game that is 
being played. Strategy improvement is a technique that originated from Markov decision processes Q, 
and has since been applied many types of games in this setting, including simple stochastic games O, 
discounted-payoff games lfl2ll . mean-payoff games (H, and parity games lfl5l [TTl. In this paper we will 
focus on the strategy improvement algorithm of Bjorklund and Vorobyov 0, which is designed to solve 
mean-payoff games, but can also be applied to parity games. 

Algorithms that solve parity and mean-payoff games have received much interest. One reason for 
this is that the model checking problem for the modal pL -calculus is polynomial time equivalent to the 
problem of solving a parity game 0Q41, an d there is a polynomial time reduction from parity games to 
mean-payoff games lfl2l . Therefore, faster algorithms for these games lead to faster model checkers for 
the /i -calculus. Secondly, both of these games lie in NP n co-NP, which implies that neither of the two 
problems are likely to be complete for either class. Despite this, no polynomial time algorithms have 
been found. 

The approach of strategy improvement can be described as follows. The algorithm begins by choos- 
ing one of the players to be the strategy improver, and then picks an arbitrary strategy for that player. A 
strategy for a player consists of a function that picks one edge for each of that player's vertices. Strategy 
improvement then computes a set of profitable edges for that strategy. If the strategy is switched so that it 
chooses some subset of the profitable edges, rather than the edges that are currently chosen, then strategy 
improvement guarantees that the resulting strategy is better in some well-defined measure. So, the algo- 
rithm picks some subset of the profitable edges to create a new, improved, strategy to be considered in 
the next iteration. This process is repeated until a strategy is found that has no profitable edges, and this 
strategy is guaranteed optimal for the strategy improver. Since any subset of the profitable edges could be 
used to create an improved strategy in each iteration, some method is needed to determine which subset 
to choose in each iteration. We call this method a switching policy, and the choice of switching policy 
can have a dramatic effect on the running time of the algorithm. 

A significant amount of research has been dedicated to finding good switching policies. In terms 
of complexity bounds, the current best switching policies are randomized, and run in an expected 
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0(2^ nlogn ) number of iterations [2]. Another interesting switching policy is the optimal switching policy 
given by Schewe |[T3ll . An optimal switching policy always picks the subset of profitable edges that yields 
the best possible successor strategy, according to the measure that strategy improvement uses to compare 
strategies. It is not difficult to show that such a subset of profitable edges must exist, but computing an 
optimal subset of profitable edges seemed to be difficult, since there can be exponentially many subsets 
of profitable edges to check. Nevertheless, Schewe's result is a polynomial time algorithm that computes 
an optimal subset of edges. Therefore, optimal switching policies can now be realistically implemented. 
It is important to note that the word "optimal" applies only to the subset of profitable edges that is chosen 
to be switched in each iteration. It is not the case that a strategy improvement algorithm equipped with 
an optimal switching policy will have an optimal running time. 

Perhaps the most widely studied switching policy is the all-switches policy, which simply selects the 
entire set of profitable edges in every iteration. Although the best upper bound for this policy is 0(2" /n) 
iterations ATI , it has been found to work extremely well in practice. Indeed, for a period of ten years 
there were no known examples upon which the all switches policy took significantly more than a linear 
number of iterations. It was for this reason that the all-switches policy was widely held to be a contender 
for a proof of polynomial time termination. 

However, Friedmann has recently found a family of examples that force a strategy improvement 
algorithm equipped with the all-switches policy to take an exponential number of steps [5 ]. Using the 
standard reductions |[T2l [171 . these examples can be generalised to provide exponential lower bounds 
for all-switches on mean-payoff and discounted-payoff games. Even more surprisingly, Friedmann's 
example can be generalised to provide an exponential lower bound for strategy improvement algorithms 
equipped with an optimal switching policy 0. This recent revelation appears to imply that there is no 
longer any hope for strategy improvement, since an exponential number of iterations can be forced even 
if the best possible improvement is made in every step. 

Our contributions. Despite ten years of research into strategy improvement algorithms, and the recent 
advances in the complexity of some widely studied switching policies, the underlying combinatorial 
structure of mean-payoff and parity games remains somewhat mysterious. There is no previous work 
which links the structural properties of a parity or mean-payoff game with the behaviour of strategy 
improvement on those games. In this paper, we introduce a structural property of these games that we 
call a snare. We show how the existence of a snare in a parity or mean-payoff game places a restriction 
on the form that a winning strategy can take for these games. Hence, we argue that every algorithm that 
computes a winning strategy for these games must, at least implicitly, deal with these structures. 

In the case of strategy improvement algorithms, we argue that snares play a fundamental role in the 
behaviour of these algorithms. We show that there is a certain type of profitable edge, which we call 
a back edge, that is the mechanism that strategy improvement uses to deal with snares. We show how 
each profitable back edge encountered by strategy improvement corresponds to some snare that exists in 
the game. Hence, we argue that the concept of a snare is a new tool that can be used in the analysis of 
strategy improvement algorithms. 

We then go on to show that, in addition to being an analytical tool, awareness of snares can be used 
to accelerate the process of strategy improvement. We propose that strategy improvement algorithms 
should remember the snares that they have seen in previous iterations, and we give a procedure that 
uses a previously recorded snare to improve a strategy. Strategy improvement algorithms can choose to 
apply this procedure instead of switching a subset of profitable edges. We give one reasonable example 
of a strategy improvement algorithm that uses these techniques. We call our algorithms non-oblivious 
strategy improvement algorithms because they remember information about their previous iterations, 
whereas previous techniques make their decisions based only on the information available in the current 
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iteration. 

In order to demonstrate how non-oblivious techniques can be more powerful than traditional strategy 
improvement, we study Friedmann's family of examples that cause the all-switches and the optimal 
switching policies to take exponential time. We show that in certain situations non-oblivious strategy 
improvement makes better progress than even the optimal oblivious switching policy. We go on to show 
that this behaviour allows our non-oblivious strategy improvement algorithms to terminate in polynomial 
time on Friedmann's examples. This fact implies that it is ignorance of snares that is a key failing of 
oblivious strategy improvement. 

2 Preliminaries 

A mean-payoff game is defined by a tuple (V, Viviax, VMin, E,w) where V is a set of vertices and E is a set 
of edges, which together form a finite graph. Every vertex must have at least one outgoing edge. The 
sets Vjviax and VMm partition V into vertices belonging to player Max and vertices belonging to player 
Min, respectively. The function w : V — > Z assigns an integer weight to every vertex. 

The game begins by placing a token on a starting vertex vo- In each step, the player that owns the 
vertex upon which the token is placed must choose one outgoing edge of that vertex and move the token 
along it. In this fashion, the two players form an infinite path % = (vq , vi , V2 , . . . } , where (v ; - , v !+ 1 ) is in E 
for every i in N. The payoff of an infinite path is defined to be = liminf ;Woo (l jn) L" =0 w(v,-). The 

objective of Max is to maximize the value of and the objective of Min is to minimize it. 

A positional strategy for Max is a function that chooses one outgoing edge for every vertex belonging 
to Max. A strategy is denoted by a : Vmsk — > V, with the condition that (v, <r(v)) is in E, for every Max 
vertex v. Positional strategies for player Min are defined analogously. The sets of positional strategies 
for Max and Min are denoted by IT-Max and n.Min, respectively. Given two positional strategies, a and z 
for Max and Min respectively, and a starting vertex vo, there is a unique path (vo, Vi , V2 • • • )■> where v,+i = 
o(vj) if Vi is owned by Max and v !+ i = t(v;) if v; is owned by Min. This path is known as the play 
induced by the two strategies a and z, and will be denoted by Play(vo, <7, t). 

For all v in V we define: 

Value* (v) = max min ^#(Play(v, ct,t)) 
Value* (v) = min max ^#(Play(v, (J, z)) 

a.\ 

These are known as the lower and upper values, respectively. For mean-payoff games we have that the 
two quantities are equal, a property called determinacy. 

Theorem 1 ( 1101 ). For every starting vertex v in every mean-payoff game we have Value* (v) = Value* (v). 

For this reason, we define Value (v) to be the value of the game starting at the vertex v, which is 
equal to both Value* (v) and Value* (v). The computational task associated with mean-payoff games is to 
find Value (v) for every vertex v. 

Computing the 0-mean partition is a decision version of this problem. This requires us to decide 
whether Value(v) > 0, for every vertex v. Bjorklund and Vorobyov have shown that only a polynomial 
number of calls to an algorithm for finding the 0-mean partition are needed to find the value for every 
vertex in a mean-payoff game 0. 

A Max strategy a is a winning strategy for a set of vertices W if ^#(v, <J,z) > for every Min 
strategy z and every vertex v in W. Similarly, a Min strategy z is a winning strategy for W if ^#(v, a, z) < 
for every Max strategy a and every vertex v in ff. To solve the 0-mean partition problem we are 
required to partition the vertices of the graph into the sets (Wusa,WMm), where Max has a winning 
strategy for V^Max and Min has a winning strategy for Wwim- 
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Figure 1 : A simple snare. 

3 Snares 

In this section we introduce a structure called that we call a "snare". The dictionary definition^ of the 
word snare is "something that serves to entangle the unwary". This is a particularly apt metaphor for 
these structures since, as we will show, a winning strategy for a player must be careful to avoid being 
trapped by the snares that are present in that player's winning set. 

The definitions in this section could be formalized for either player. We choose to focus on player 
Max because we will later choose Max to be the strategy improver. For a set of vertices W we define 
G \ W to be the sub-game induced by W, which is G with every vertex not in W removed. A snare for 
player Max is defined to be a subgame for which player Max can guarantee a win from every vertex. 

Definition 2 (Max Snare). For a game G, a snare is defined to be a tuple (W,%) where W QV and 
X -W D Vmcix — > W is a partial strategy for player Max that is winning for every vertex in the subgame 
G \ W. 

This should be compared with the concept of a dominion that was introduced by Jurdziriski, Paterson, 
and Zwick [8 ]. A dominion is also a subgame in which one of the players can guarantee a win, but with 
the additional constraint that the opponent is unable to leave the dominion. By contrast, the opponent 
may be capable of leaving a snare. We define an escape edge for Min to be an edge that Min can use to 
leave a Max snare. 

Definition 3 (Escapes). Let W be a set of vertices. We define the escapes from W as Esc(W) = {(v,u) £ 
E : v £ W Pi V M in and u^W} 

It is in Min's interests to use at least one escape edge from a snare, since if Min stays in a Max snare 
forever, then Max can use the strategy % to ensure a positive payoff. In fact, we can prove that if z is a 
winning strategy for Min for some subset of vertices then X must use at least one escape from every Max 
snare that exists in that subset of vertices. 

Theorem 4. Suppose that z is a winning strategy for Min on a set of vertices S. If(W,x) iJ a Max snare 
where W C S, then there is some edge (y,u) in Esc(VK) such that t(v) = u. 

Figured] shows an example of a subgame upon which a snare can be defined. In all of our diagrams, 
boxes are used to represent Max vertices and triangles are used to represent Min vertices. The weight 
assigned to each vertex is shown on that vertex. If we take W = {v,u} and %(v) = u then (W,%) will be a 
Max snare in every game that contains this structure as a subgame. This is because the cycle is positive, 
and therefore % is a winning for Max on the subgame induced by W. There is one escape from this snare, 
which is the edge Min can use to break the cycle at u. 

Since the example is so simple, Theorem 0] gives a particularly strong property for this snare: every 
winning strategy for Min must use the escape edge at u. If Min uses the edge (u, v) in some strategy, then 
Max can respond by using the edge (v, u) to guarantee a positive cycle, and therefore the strategy would 
not be winning for Min. This is a strong property because we can essentially ignore the edge (u,v) in 
every game into which the example is embedded. This property does not hold for snares that have more 
than one escape. 

1 American Heritage Dictionary of the English Language, Fourth Edition 



4 



Non-oblivious Strategy Improvement 



John Fearnley 



4 Strategy Improvement 

In this section we will summarise Bjorklund and Vorobyov's strategy improvement algorithm for finding 
the 0-mean partition of a mean-payoff game [2]. Their algorithm requires that the game is modified by 
adding retreat edges from every Max vertex to a special sink vertex. 

Definition 5 (Modified Game). A game (V,VM ax ,VMm,E,w) will be modified to create (V U {s},Vmox U 
{s},Vmui,E' \w')> where E' = EU {(v,s) : v £ Vmox}> and w'(v) = w(v) for all vertices v in V, and 
w'(s) = 0. 

Strategy improvement always works with the modified game, and for the rest of the paper we will 
assume that the game has been modified. 

Given two strategies, one for each player, the play induced by the two strategies is either a finite path 
that ends at the sink or a finite initial path followed by an infinitely repeated cycle. This is used to define 
the valuation of a vertex. 

Definition 6 (Valuation). Let o be a positional strategy for Max and X be a positional strategy for 
Min. 7fPlay(vo, cj, t) = (vo,vi, .. {cq,c\, .. -ci}®}, for some vertex vq, then we define Val CT ' T (vo) = 
— °° if Y!i=o w ( c i) — and °° otherwise. Alternatively, if Play (v, <7,t) = (vo,Vi,...Vk,s) then we define 
Val^ T (v )=li w(v ; -). 

Strategy improvement algorithms choose one player to be the strategy improver, which we choose to 
be Max. For a Max strategy a, we define br(a) to be the best response to a, which is a Min strategy 
with the property Val f7,br ( ff )(v) < Val CT T (v) for every vertex v and every Min strategy z. Such a strategy 
always exists, and Bjorklund and Vorobyov give a method to compute it in polynomial time (H. We will 
frequently want to refer to the valuation of a vertex v when the Max strategy a is played against br(a), 
so we define Val CT (v) to be shorthand for Val ff br ^(v). Occasionally, we will need to refer to valuations 
from multiple games. We use Valg(v) to give the valuation of the vertex v when a is played against 
br(a) in the game G. We extend all of our notations in a similar manner, by placing the game in the 
subscript. 

For a Max strategy a and an edge (v,w) that is not chosen by a, we say (v, w) is profitable in a if 
Val CT (a(v)) < Val CT (w). Switching an edge (v,u) in a is denoted by a[v \-> u]. This operation creates a 
new strategy where, for a vertex w G Vmox we have a[v \-t u] (w) = u if w = v, and a(w) otherwise. Let F 
be a set of edges that contains at most one outgoing edge from each vertex. We define a [F] to be a with 
every edge in F switched. The concept of profitability is important because switching profitable edges 
creates an improved strategy. 

Theorem 7 (HI). Let o be a strategy and P be the set of edges that are profitable in o. Let F CP be 
a subset of the profitable edges that contains at most one outgoing edge from each vertex. For every 
vertex v we have Val CT (v) < Val ^' (v), and there is a vertex for which the inequality is strict. 

The second property that can be shown is that a strategy with no profitable edges is optimal. An 
optimal strategy is a Max strategy a such that Val CT (v) > Val^(v) for every Max strategy % and every 
vertex v. The 0-mean partition can be derived from an optimal strategy a: the set Wm 3X contains every 
vertex v with Val CT (v) = °°, and Wmn contains every vertex v with Val CT (v) < <». 

Theorem 8 (0). A strategy with no profitable edges is optimal. 

Strategy improvement begins by choosing a strategy Go with the property that Val CT °(v) > — °° for 
every vertex v. One way to achieve this is to set Ob(v) = s for every vertex v in Vmsx- This guarantees the 
property unless there is some negative cycle that Min can enforce without passing through a Max vertex. 
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Clearly, for a vertex v on one of these cycles, Max has no strategy a with Val CT (v) > — °°. These vertices 
can therefore be removed in a preprocessing step and placed in Wjyiin- 

For every strategy a, a new strategy a, + i = CJ, [F] will be computed, where F is a subset of the 
profitable edges in a,, which contains at most one outgoing edge from each vertex. Theorem [7] implies 
that Val CT,+1 (v) > Val CT ' (v) for every vertex v, and that there is a vertex for which the inequality is strict. 
This implies that a strategy cannot be visited twice by strategy improvement. The fact that there is a 
finite number of positional strategies for Max implies that strategy improvement must eventually reach 
a strategy Ok in which no edges are profitable. Theorem [8] implies that a* is the optimal strategy, and 
strategy improvement terminates. 

Strategy improvement requires a rule that determines which profitable edges are switched in each 
iteration. We will call this a switching policy. Oblivious switching policies are defined as a : 2 E — > 2 E , 
where for every set P C E, we have that a(P) contains at most one outgoing edge for each vertex. 

Some of the most widely studied switching policies are all-switches policies. These policies always 
switch every vertex that has a profitable edge, and when a vertex has more than one profitable edge an 
additional rule must be given to determine which edge to choose. Traditionally this choice is made by 
choosing the successor with the highest valuation. We must also be careful to break ties when there are 
two or more successors with the highest valuation. Therefore, for the purposes of defining this switching 
policy we will assume that each vertex v is given a unique index in the range {1,2, . . . , |V|}, which we 
will denote as Index (v). 

All(F) = {(v,u) : There is no edge (v,w) £ F with Val c7 (w) < Val CT (w) 
or with Val CT (w) = Val CT (w) and Index(w) < Index(w)}. 

In the introduction we described optimal switching policies, which we can now formally define. A 
switching policy is optimal if it selects a subset of profitable edges F that satisfies Val ff ^ (v) < Val ff [ F ] (v) 
for every subset of profitable edges H and every vertex v. Schewe has given a method to compute such a 
set in polynomial time [13]. We will denote an optimal switching policy as Optimal. 

5 Strategy Trees 

The purpose of this section is to show how a strategy and its best response can be viewed as a tree, and 
to classify profitable edges by their position in this tree. We will classify edges as either cross edges or 
back edges. We will later show how profitable back edges are closely related to snares. 

It is technically convenient for us to make the assumption that every vertex has a finite valuation 
under every strategy. The choice of starting strategy ensures that for every strategy a considered by 
strategy improvement, we have Val CT (v) > — °° for every vertex v. Obviously, there may be strategies 
under which some vertices have a valuation of °°. The first part of this section is dedicated to rephrasing 
the problem so that our assumption can be made. 

We define the positive cycle problem to be the problem of finding a strategy a with Val CT (v) = °° for 
some vertex v, or to prove that there is no strategy with this property. The latter can be done by finding 
an optimal strategy a with VaL ff (v) < oo for every vertex v. We can prove that a strategy improvement 
algorithm for the positive cycle problem can be adapted to find the 0-mean partition. 

Proposition 9. Let a be a strategy improvement algorithm that solves the positive cycle problem in 0(k) 
time. There is a strategy improvement algorithm which finds the O-mean partition in 0(\V\ ■ k) time. 

We consider switching policies that solve the positive cycle problem, and so we can assume that every 
vertex has a finite valuation under every strategy that our algorithms consider. Our switching policies 
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Figure 2: A strategy tree. 



will terminate when a vertex with infinite valuation is found. With this assumption we can define the 
strategy tree. 

Definition 10 (Strategy Tree). Given a Max strategy a and a Min strategy z we define the tree T a x = 
(V,E') where E' = {(v,u) : a(v) = u or z{v) = u}. 

In other words, T a x is a tree rooted at the sink whose edges are those chosen by a and z. We define 
T a to be shorthand for T a ' h ^ a \ and Subtree CT (v) : V — > 2 V to be the function that gives the vertices in 
the subtree rooted at the vertex v in T a . 

We can now define our classification for profitable edges. Let (v, u) be a profitable edge in the 
strategy a. We call this a profitable back edge if u is in Subtree CT (v), otherwise we call it a profitable 
cross edge. 

Figure |2]gives an example of a strategy tree. In all of our diagrams, dashed lines give a strategy a for 
player Max, and dotted lines show Min's best response to the strategy of Max. The strategy tree contains 
every vertex, and every edge that is either dashed or dotted. The subtree of v is the set {v,b,c,d,u}. The 
edge (v, u) is profitable because Val CT (v) = and Na\° (u) = 1. Since u is contained in the subtree of v, 
the edge (v, u) is a profitable back edge. 

6 Profitable Back Edges 

In this section we will expose the intimate connection between profitable back edges and snares. We will 
show how every profitable back edge corresponds to some snare that exists in the game. We will also 
define the concept of snare consistency, and we will show how this concept is linked with the conditions 
implied by Theorem [4] 

Our first task is to show how each profitable back edge corresponds to some Max snare in the game. 
Recall that a Max snare consists of a set of vertices, and a strategy for Max that is winning for the 
subgame induced by those vertices. We will begin by defining the set of vertices for the snare that 
corresponds to a profitable back edge. For a profitable back edge (v,u) in a strategy a we define the 
critical set, which is the vertices in Subtree CT (v) that Min can reach when Max plays a. 
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Definition 11 (Critical Set). If (y, u) is a profitable back edge in the strategy <J, then we define the critical 
set as Critical CT (v, u) = {w G Subtree CT (v) : There is a path {u,u\, . . .Uk = w) where for all i with 1 < i < k 
we have ui G Subtree CT (v) and ifui G Vuax tnen u i+\ = o(ui)}. 

In the example given in Figure[2l the critical set of the edge (v, u) is {v, b, d, u}. The vertex b is in the 
critical set because it is in the subtree of v, and Min can reach it from u when Max plays a. In contrast, 
the vertex c is not in the critical set because a(d) = v, and therefore Min cannot reach c from u when 
Max plays a. The vertex a is not in the critical set because it is not in the subtree of v. 

Note that in the example, g [v \-t u] is a winning strategy for the subgame induced by critical set. The 
definition of the critical set is intended to capture the largest connected subset of vertices contained in 
the subtree of v for which <j[v i->- u] is guaranteed to be a winning strategy. 

Proposition 12. Let (v,w) be a profitable back edge in the strategy o and let C be Critically, u). The 
strategy a[vi->«] is winning for every vertex in G \ C. 

We can now formally define the snare that is associated with each profitable back edge that is 
encountered by strategy improvement. For a profitable back edge (y,u) in a strategy a we define 
Snare" 7 (v,u) = (Critical (v,u) , x) where x( v ) = c[vi->- u](v) if v G Critical CT (v,M), and undefined at other 
vertices. Proposition [12] confirms that this meets the definition of a snare. 

We will now argue that the conditions given by Theorem 0] must be observed in order for strategy 
improvement to terminate. We begin by defining a concept that we call snare consistency. We say that a 
Max strategy is consistent with a snare if Min's best response chooses an escape from that snare. 

Definition 13 (Snare Consistency). A strategy a is said to be consistent with the snare (W,x) */br(cr) 
uses some edge in Esc(W). 

In the example given in Figure [2] we can see that a is not consistent with Snare CT (v,w). This is 
because br(a) does not choose the edge (b,a). However, once the edge (v,u) is switched we can prove 
that br(a[v >-)■ u]) must use the edge (b,a). This is because Min has no other way of connecting every 
vertex in Subtree (v) to the sink, and if some vertex is not connected to the sink then its valuation will 
rise to oo. 

Proposition 14. Let (v,u) be a profitable back edge in the strategy o. There is some edge (x,y) in 
Esc(Critical CT (v,M)) such that br(a[v i->- u])(x) = y. 

We can show that strategy improvement cannot terminate unless the current strategy is consistent 
with every snare that exists in the game. This is because every strategy that is not consistent with some 
snare must contain a profitable edge. 

Proposition 15. Let o be a strategy that is not consistent with a snare (W,x)- There is a profitable edge 
(v, u) in o such that % (v) = u. 

These two propositions give us a new tool to study the process of strategy improvement. Instead of 
viewing strategy improvement as a process that tries to increase valuations, we can view it as a process 
that tries to force consistency with Max snares. Proposition Q3] implies that this process can only termi- 
nate when the current strategy is consistent with every Max snare in the game. Therefore, the behaviour 
of strategy improvement on an example is strongly related with the snares that exist for the strategy 
improver in that example. 
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7 Using Snares To Guide Strategy Improvement 

In the previous sections, we have shown the strong link between snares and strategy improvement. In 
this section we will show how this insight can be used to guide strategy improvement. We will give a 
procedure that takes a strategy that is inconsistent with some snare, and returns an improved strategy 
that is consistent with that snare. Since the procedure is guaranteed to produce an improved strategy, 
it can be used during strategy improvement as an alternative to switching a profitable edge. We call 
algorithms that make use of this procedure non-oblivious strategy improvement algorithms, and we give 
a reasonable example of such an algorithm. 

To define our procedure we will use Proposition [15] Recall that this proposition implies that if a 
strategy a is inconsistent with a snare (W,x), then there is some profitable edge (v,u) in a such that 
%{v) = u. Our procedure will actually be a strategy improvement switching policy. This policy will 
always choose to switch an edge that is chosen by % but not by the current strategy. As long as the 
current strategy remains inconsistent with (W,%) such an edge is guaranteed to exist, and the policy 
terminates once the current strategy is consistent with the snare. This procedure is shown as Algorithm Q] 

Algorithm 1 FixSnare(a, (W,x)) 

while a is inconsistent with (W,%) do 

(v,w) := Some edge where #(v) = w and (v,w) is profitable in a. 

o := a[v h-» u] 
end while 
return a 



In each iteration the switching policy switches one vertex v to an edge (v, u) with the property that 
%{v) = u, and it never switches a vertex at which the current strategy agrees with %. It is therefore not 
difficult to see that if the algorithm has not terminated after \W\ iterations then the current strategy will 
agree with % on every vertex in W. We can prove that such a strategy must be consistent with (W,%), 
and therefore the switching policy must terminate after at most |W| iterations. 

Proposition 16. Let o be a strategy that is not consistent with a snare (W,%). Algorithm\J\will arrive at 
a strategy o' which is consistent with (W,%) after at most \ W\ iterations. 

Since FixSnare is implemented as a strategy improvement switching policy that switches only prof- 
itable edges, the strategy that is produced must be an improved strategy. Therefore, at any point during 
the execution of strategy improvement we can choose not to switch a subset of profitable edges and run 
FixSnare instead. Note that the strategy produced by FixSnare may not be reachable from the current 
strategy by switching a subset of profitable edges. This is because FixSnare switches a sequence of 
profitable edges, some of which may not have been profitable in the original strategy. 

We propose a new class of strategy improvement algorithms that are aware of snares. These al- 
gorithms will record a snare for every profitable back edge that they encounter during their execution. 
In each iteration these algorithms can either switch a subset of profitable edges or run the procedure 
FixSnare on some recorded snare that the current strategy is inconsistent with. We call these algorithms 
non-oblivious strategy improvement algorithms, and the general schema that these algorithms follow is 
shown in Algorithm [2] 

Recall that oblivious strategy improvement algorithms required a switching policy to specify which 
profitable edges should be switched in each iteration. Clearly, non-oblivious strategy improvement al- 
gorithms require a similar method to decide whether to apply the procedure FixSnare or to pick some 
subset of profitable edges to switch. Moreover, they must decide which snare should be used when the 
procedure FixSnare is applied. We do not claim to have the definitive non-oblivious switching policy, but 
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Algorithm 2 NonOblivious(a) 
5:=0 

while a has a profitable edge do 

S := SU {Snare CT (v, u) : (v, u) is a profitable back edge in a} 

a := Policy (a, S) 
end while 
return a 



in the rest of this section we will present one reasonable method of constructing a non-oblivious version 
of an oblivious switching policy. We will later show that our non-oblivious strategy improvement algo- 
rithms behave well on the examples that are known to cause exponential time behaviour for oblivious 
strategy improvement. 

We intend to take an oblivious switching policy a as the base of our non-oblivious switching policy. 
This means that when we do not choose to use the procedure FixSnare, we will switch the subset of 
profitable edges that would be chosen by a. Our goal is to only use FixSnare when doing so is guaranteed 
to yield a larger increase in valuation than applying a. Clearly, in order to achieve this we must know how 
much the valuations increase when a is applied and how much the valuations increase when FixSnare is 
applied. 

Determining the increase in valuation that is produced by applying an oblivious switching policy is 
easy. Since every iteration of oblivious strategy improvement takes polynomial time, We can simply 
switch the edges and measure the difference between the current strategy and the one that would be 
produced. Let a be a strategy and let P be the set of edges that are profitable in a. For an oblivious 
switching policy a the increase of applying a is defined to be: 

Increase(a,a) = £ (VaF^ (v) - Val a (v)) 
vev 

We now give a lower bound on the increase in valuation that an application of FixSnare produces. 
Let (W,x) be a snare and suppose that the current strategy a is inconsistent with this snare. Our lower 
bound is based on the fact that FixSnare will produce a strategy that is consistent with the snare. This 
means that Min's best response is not currently choosing an escape from the snare, but it will be forced 
to do so after FixSnare has been applied. It is easy to see that forcing the best response to use a different 
edge will cause an increase in valuation, since otherwise the best response would already be using that 
edge. Therefore, we can use the increase in valuation that will be obtained when Min is forced to use 
and escape. We define: 

SnareIncrease CT (lV,^) = min{(Val ff (;y) - Val ff (x) : (x,y) € Esc(W)} 

This expression gives the smallest possible increase in valuation that can happen when Min is forced to 
use an edge in Esc(lV). We can prove that applying FixSnare will cause an increase in valuation of at 
least this amount. 

Proposition 17. Let o be a strategy that is not consistent with a snare (W,%), and let o' be the result of 
FixSnare (cr, (W,x))- We have: 

£ (Val ff '(v) - Val ff (v)) > SnareIncrease ff (W,^) 

vev 

We now have the tools necessary to construct our proposed augmentation scheme, which is shown as 
Algorithm [3] The idea is to compare the increase obtained by applying a and the increase obtained by 
applying FixSnare with the best snare that has been previously recorded, and then to only apply FixSnare 
when it is guaranteed to yield a larger increase in valuation. 
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Algorithm 3 (Augment(a))(a,S) 

(W,x) '■= argmaX( X ^ I ) e5 SnareIncrease c7 (X, i u) 
if Increase (a, a) > SnareIncrease CT (W,^) then 

P := {(v, u) : (v, u) is profitable in a} 

a := o[a(P)] 
else 

a :=FixSnare(a,(W,^)) 
end if 
return a 




Figure 3: A component of Friedmann's exponential time example. 



8 Comparison With Oblivious Strategy Improvement 



In this section we will demonstrate how non-oblivious strategy improvement can behave well in situa- 
tions where oblivious strategy improvement has exponential time behaviour. Unfortunately, there is only 
one source of examples with such properties in the literature, and that is the family of examples given 
by Friedmann. In fact, Friedmann gives two slightly different families of hard examples. The first type 
is the family that that forces exponential behaviour for the all-switches policy Q, and the second type 
is the family that forces exponential behaviour for both all-switches and optimal switching policies (H. 
Although our algorithm performs well on both families, we will focus on the example that was designed 
for optimal switching policies because it is the most interesting of the two. 

This section is split into two parts. In the first half of this section we will study a component part of 
Friedmann's example upon which the procedure FixSnare can out perform an optimal switching policy. 
This implies that there are situations in which our augmentation scheme will choose to use FixSnare. 
In the second half, we will show how the good performance on the component part is the key property 
that allows our non-oblivious strategy improvement algorithms to terminate quickly on Friedmann's 
examples. 
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8.1 Optimal Switching Policies 

We have claimed that the procedure FixSnare can cause a greater increase in valuation than switching any 
subset of profitable edges. We will now give an example upon which this property holds. The example 
that we will consider is shown in Figure [3j and it is one of the component parts of Friedmann's family of 
examples that force optimal policies to take an exponential number of steps |6j. 

The diagram shows a strategy for Max as a set of dashed edges. It also shows Min's best response 
to this strategy as a dotted edge. Even though this example could be embedded in an arbitrary game, 
we can reason about the behaviour of strategy improvement by specifying, for each edge that leaves the 
example, the valuation of the successor vertex that the edge leads to. These valuations are shown as 
numbers at the end of each edge that leaves the example. 

In order to understand how strategy improvement behaves we must determine the set of edges that are 
profitable for our strategy. There are two edges that are profitable: the edge (z,v) is profitable because the 
valuation of v is 2 which is greater than 0, and the edge at x that leaves the example is profitable because 
leaving the example gives a valuation of 2 and the valuation of y is 1. The edge (y,z) is not profitable 
because the valuation of z is 0, which is smaller than the valuation of 1 obtained by leaving the example 
at y. 

For the purposes of demonstration, we will assume that no other edge is profitable in the game into 
which the example is embedded. Furthermore, we will assume that no matter what profitable edges are 
chosen to be switched, the valuation of every vertex not contained in the example will remain constant. 
Therefore, the all-switches policy will switch the edges (z,v) and the edge leading away from the example 
at the vertex x. It can easily be verified that this is also the optimal subset of profitable edges, and so 
the all-switches and the optimal policies make the same decisions for this strategy. After switching the 
edges chosen by the two policies, the valuation of x will rise to 2, the valuation of z will rise to 3, and the 
valuation of y remain at 1 . 

By contrast, we will now argue that non-oblivious strategy improvement would raise the valuations 
of x, y, and z to 2 100 + 1. Firstly, it is critical to note that the example is a snare. If we set W = {v,x,y,z} 
and choose % to be the partial strategy for Max that chooses the edges (x,y), (y,z), and (z,v), then (W,x) 
will be a snare in every game into which the example is embedded. This is because there is only one 
cycle in the subgame induced by W when Max plays %, and this cycle has positive weight. 

Now, if the non-oblivious strategy improvement algorithm was aware of the snare (W,%) then the 
lower bound given by Proposition [T7] would be 2 100 . This is because closing the cycle forces Min's 
best response to use escape edge to avoid losing the game. Since 2 is much larger than the increase 
obtained by the optimal switching policy, the policies Augment(All) and Augment(Optimal) will choose 
to run FixSnare on the snare (W,%). Once consequence of this is that the policy Optimal is no longer 
optimal in the non-oblivious setting. 

8.2 Friedmann's Exponential Time Examples 

The example that we gave in the previous subsection may appear to be trivial. After all, if the valuations 
outside the example remain constant then both the all-switches and optimal switching policies will close 
the cycle in two iterations. A problem arises, however, when the valuations can change. Note that when 
we applied the oblivious policies to the example, no progress was made towards closing the cycle. We 
started with a strategy that chose to close the cycle at only one vertex, and we produced a strategy that 
chose to close the cycle at only one vertex. When the assumption that valuations outside the example 
are constant is removed, it becomes possible for a well designed game to delay the closing of the cycle 
for an arbitrarily large number of iterations simply by repeating the pattern of valuations that is shown in 
Figure [3] 
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Figure 4: The bits of a binary counter. 



Friedmann's family of examples exploits this property to build a binary counter, which uses the 
subgame shown in Figure[3]to represent the bits. The general idea of this approach is shown in Figured 
Friedmann's example uses n instances of the cycle, indexed 1 through n. These bits are interconnected 
in a way that enforces two properties on both the all-switches and the optimal switching policies. Firstly, 
the ability to prevent a cycle from closing that we have described is used to ensure that the cycle with 
index i can only be closed after every cycle with index smaller than i has been closed. Secondly, when 
the cycle with index i is closed, every cycle with index smaller than i is forced to open. Finally, every 
cycle is closed in the optimal strategy for the example. Now, if the initial strategy is chosen so that every 
cycle is open, then these three properties are sufficient to force both switching policies to take at least 2" 
steps before terminating. 

The example works by forcing the oblivious switching policy to make the same mistakes repeatedly. 
To see this, consider the cycle with index n — 1. When the cycle with index n is closed for the first 
time, this cycle is forced open. The oblivious optimal switching policy will then not close it again for 
at least another 2" _1 steps. By contrast, the policies Augment(All) and Augment (Optimal) would close 
the cycle again after a single iteration. This breaks the exponential time behaviour, and it turns out that 
both of our policies terminate in polynomial time on Friedmann's examples. 

Of course, for Friedmann's examples we can tell simply by inspection that Max always wants to keep 
the cycle closed. It is not difficult, however, to imagine an example which replaces the four vertex cycle 
with a complicated subgame, for which Max had a winning strategy and Min's only escape is to play 
to the vertex with a large weight. This would still be a snare, but the fact that it is a snare would only 
become apparent during the execution of strategy improvement. Nevertheless, as long as the complicated 
subgame can be solved in polynomial time by non-oblivious strategy improvement, the whole game will 
also be solved in polynomial time. This holds for exactly the same reason as the polynomial behaviour 
on Friedmann's examples: once the snare representing the subgame has been recorded then consistency 
with that snare can easily be enforced in the future. 



9 Conclusions and Further Work 

In this paper we have uncovered and formalized a strong link between the snares that exist in a game 
and the behaviour of strategy improvement on that game. We have shown how awareness of this link can 
be used to guide the process of strategy improvement. With our augmentation procedure we gave one 
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reasonable method of incorporating non-oblivious techniques into traditional strategy improvement, and 
we have demonstrated how these techniques give rise to good behaviour on the known exponential time 
examples. 

It must be stressed that we are not claiming that simply terminating in polynomial time on Fried- 
mann's examples is a major step forward. After all, the randomized switching policies of Bjorklund 
and Vorobyov |2] have the same property. What is important is that our strategy improvement algo- 
rithms are polynomial because they have a better understanding of the underlying structure of strategy 
improvement. Friedmann's examples provide an excellent cautionary tale that shows how ignorance of 
this underlying structure can lead to exponential time behaviour. 

There are a wide variety of questions that are raised by this work. Firstly, we have the structure of 
snares in parity and mean-payoff games. Theorem [4] implies that all algorithms that find winning strate- 
gies for parity and mean payoff games must, at least implicitly, consider snares. We therefore propose 
that a thorough and complete understanding of how snares arise in a game is a necessary condition for 
devising a polynomial time algorithm for these games. 

It is not currently clear how the snares in a game affect the difficulty of solving that game. It is 
not difficult, for example, to construct a game in which there an exponential number of Max snares: in 
a game in which every weight is positive there will be a snare for every connected subset of vertices. 
However, games with only positive weights have been shown to be very easy to solve 0. Clearly, the 
first challenge is to give a clear formulation of how the structure of the snares in a given game affects the 
difficulty of solving it. 

In our attempts to construct intelligent non-oblivious strategy improvement algorithms we have con- 
tinually had problems with examples in which Max and Min snares overlap. By this we mean that the set 
of vertices that define the subgames of the snares have a non empty intersection. We therefore think that 
studying how complex the overlapping of snares can be in a game may lead to further insight. There are 
reasons to believe that these overlappings cannot be totally arbitrary, since they arise from the structure 
of the game graph and the weights assigned to the vertices. 

We have presented a non-oblivious strategy improvement algorithm that passively records the snares 
that are discovered by an oblivious switching policy, and then uses those snares when doing so is guar- 
anteed to lead to a larger increase in valuations. While we have shown that this approach can clearly 
outperform traditional strategy improvement, it does not appear to immediately lead to a proof of poly- 
nomial time termination. It would be interesting to find an exponential time example for the augmented 
versions of the all-switches policy or of the optimal policy. This may be significantly more difficult 
since it is no longer possible to trick strategy improvement into making slow progress by forcing it to 
repeatedly close a small number of snares. 

There is no inherent reason why strategy improvement algorithms should be obsessed with trying 
to increase valuations as much as possible in each iteration. Friedmann's exponential time example for 
the optimal policy demonstrates that doing so in no way guarantees that the algorithm will always make 
good progress. Our work uncovers an alternate objective that strategy improvement algorithms can use 
to measure their progress. Strategy improvement algorithms could actively try to discover the snares 
that exist in the game, or they could try and maintain consistency with as many snares as possible, for 
example. There is much scope for an intelligent snare based strategy improvement algorithm. 

We have had some limited success in designing intelligent snare based strategy improvement algo- 
rithms for parity games. We have developed a non-oblivious strategy improvement algorithm which, 
when given a list of known snares in the game, either solves the game or finds a snare that is not in 
the list of known snares. This gives the rather weak result of a strategy improvement algorithm whose 
running time is polynomial in \V\ and k, where k is the number of Max snares that exist in the game. 
This is clearly unsatisfactory since we have already argued that k could be exponential in the number of 
vertices. However, this is one example of how snares can be applied to obtain new bounds for strategy 
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improvement. As an aside, the techniques that we used to obtain this algorithm do not generalize to 
mean-payoff games. Finding a way to accomplish this task for mean-payoff games is an obvious starting 
point for designing intelligent snare based algorithms for this type of game. 

Acknowledgements. I am indebted to Marcin Jurdzihski for his guidance, support, and encouragement 
during the preparation of this paper. 
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A Proofs for Section [3] 

A. l Proof of Theorem H 

Proof. For the sake of contradiction, suppose that T is a winning strategy for S that does not choose an 
edge in Esc(W). Since % also does not choose an edge that leaves W, we have that Play(v,^,T) never 
leaves the set W, for every vertex v in W. Furthermore, since ^ is a winning strategy for the subgame 
induced by W we have ^(Play(v,^, t)) > for every vertex v in W, which contradicts the fact that T is 
a winning strategy for S. □ 

B Proofs for Section |5] 

B. l Proof of Proposition |9] 

Proof. The algorithm is shown as Algorithm 01 We use the notation G \ U to refer to the sub-game 
of G induced by the set of vertices U. Its correctness follows from a result of Zielonka lfl6l which 
was originally shown for parity games, but identical techniques apply in this setting. Let W be a set of 
vertices, we define the set of vertices from which Max can force the token into W in one step as 



We then define the attractor of W to be the set of vertices from which Max can force play into W. 



Zielonka showed that if W is a subset of Max's winning set, which is the set of vertices with value greater 
than 0, then both winning sets can be found by solving the sub-game G \ Attr(W). 

In our setting the algorithm a finds the set W, and it is clear that the loop computes Attr(W). There- 
fore, we get that our algorithm finds the 0-mean partition. Moreover, since each recursive call decreases 
the size of the game by at least one vertex we get that at most \V\ calls to a are made. 



Algorithm 4 ZeroMeanPartition(a, a, G) 
o := a (a) 

while There is an edge (v, u) with Val CT (v) < °° and Val CT (w) = oo do 

a := <j[v i y u] 
end while 



Pre(W) ={v£ Vmex : There is an edge(v,w) with u G W} 
U{v G VMin : All edges (v,«) have u G W}. 



W 

Wi 

Attr(W) 



W,-_iUPre(W^_i) 



W 



W >0 :={v : Val ff (v) = 00} 
U:=V\W >0 

(W< 0! W> ) :=ZeroMeanPartition(a,a,G \ U) 
return (^ ,W> UW; o ) 



□ 
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C Proofs for Section [6] 
C.l Proof of Proposition M 

Proof. Since C is a critical set it must be the case that every vertex in C must be in the subtree of v 
according to a, and this implies that a[v \— > u](w) is not the sink for every vertex w in C. Note that only 
paths ending at the sink can have finite valuations, and that no such paths can exist when 0[vh-h] is 
played in G \ C. This implies that Val^'^"' (w) is either oo or — oo, and we will argue that the latter is not 
possible. 

Suppose for the sake of contradiction that there is a vertex w with the property Val^^"' (w) = — oo. 
Let x be brG\c{&[v «])• We define x' to be a strategy G that follows x on the vertices in G \ C and 
makes arbitrary decisions at the other vertices. For every vertex w in Vmu, we choose some edge (w,x) 
and define 

\ x(w) ifweC, 
x otherwise. 



Now consider a [v i->- u] played against x' on the game G. Note that neither of the two strategies choose 
an edge that leaves the set C and so Play G (w,a[v H> u],x') = Play G ^ c (w, a[v h-» u],x') for every vertex 

w in C. Since valuations can be derived from the play, this implies that Val G ^"'' T (w) = — oo. By the 
properties of the best response we have for every vertex w in C. 

Val^" 1 (w) < Val G [v ^ ] ' T '(w) = -oo < Val G (w) 

This contradicts Theorem |7J and so we can conclude that Val G j^ (w) = oo for every vertex w in C. □ 

C.2 Proof of Proposition M 

Proof. Consider a strategy x for player Min for which there is no edge (x,y) in Esc (Critically, u)) with 
x{x) = y. We argue that Val CT ^"l T (vv) = oo for every vertex w in Critically, u). Note that neither 
o[v i — y u\ or x chooses an edge that leaves Critical^ (v, u), which implies that Play(w, o[v \— > u],x) does 
not leave Critical (v,u), for every vertex w in Critical^ (v, u). By Proposition [T2l we have that a[v \— > u] 
is a winning strategy for G \ Critically, u), and therefore Val c7 [ Vh ^"] ,T (w) = oo for every vertex w in 
Critical CT (v,w). 

We will now construct a strategy for Min which, when played against a [v i-)- u] , guarantees a finite 
valuation for some vertex in Critical^ (v, u). Let (x,y) be some edge in Esc (Critically, u)). We define 
the Min strategy x, for every vertex w in VMin as 



X(w) 



y if w = x, 

br(a)(w) otherwise. 



By definition of critical set we have that y cannot be in the subtree of v, since otherwise it would also be in 
Critical CT (v, u). This implies that Play(y, a,br(a)) = Play(y, a[v \-}u],t), since x = br(cr) on every vertex 
that is not in Subtree CT (v), and a = G [v t-tu] on every vertex that is not v. From this we can conclude that 
Val ff M. T (y) = Val ff (.y) < oo. By construction of X we have that VaP^^x) = Val a ^ 1 '^ (y) + w(x), 
and so we also have Val ff[v ^ Hl ' T (x) < oo. 

In summary, we have shown that every Min strategy x that does not use an edge in Esc (Critically, u)) 
has the property Val^'^'^w) = oo for every vertex v in Critically, u). We have also shown that there 
is a Min strategy x which guarantees Val ff [ Vh ^"]' T (w) < oo for some vertex w in Critical^ (v, u). From the 
properties of a best response we can conclude that Min must use some edge in Esc(Critical CT (v, u)). □ 
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C.3 Proof of Proposition [15] 

Proof. In order to prove the claim we will construct an alternate game. We define the game G' = 
(V, V Ma x , V M in ,E',w) where: 

E' = {(v,u) : a(v) = u or brG(cr)(v) = u or %(v) = u}. 

In other words, we construct a game where Min is forced to play brc(a)(v) and Max's strategy can be 
constructed using a combination of the edges used by a and %. Since Min is forced to play br G (a)(v) 
we have that Val^(v) = Val^;(v) for every vertex v. To decide if an edge is profitable we compare two 
valuations, and since the valuation of a is the same in both G and G' we have that an edge is profitable 
for (7 in G if and only if it is profitable for a in G'. Note also that the only way a can be modified in 
G' is to choose an edge that is chosen by % but not by o. Therefore, to prove our claim it is sufficient to 
show that a has a profitable edge in G' . 
We define the strategy: 

la(v) otherwise. 

We will argue that %' is a better strategy than ff in G'. The definition of a snare implies that % is a 
winning strategy for the sub-game induced by W, and by assumption we have that br(a) does not use an 
edge in Esc (IV). We therefore have that Val^,(v) = oo for every vertex v in W. On the other hand, since 
we are considering the positive cycle problem, we know that Val^,(v) < oo for every vertex v in W. This 
implies that a is not the optimal strategy in G'. Theorem [7] implies that all non-optimal strategies must 
have at least one profitable edge, and the only edges that can be profitable in G' are those chosen by %. 
Therefore there is some edge chosen by % that is profitable for a in G' and as we have argued this also 
means that the edge is profitable for a in G. □ 



D Proofs for Section [7] 
D. 1 Proof of Proposition [16] 

Proof. By Proposition [15] we know that as long as the current strategy is not consistent with the snare 
(W,x) there must be an edge (v, u) with %{v) =u that is profitable in a. The switching policy will always 
choose this edge, and will terminate once the current strategy is consistent with the snare. Therefore in 
each iteration the number of vertices upon which a and % differ decreases by 1. It follows that after at 
most \W\ iterations we will have a(v) = #(v) for every vertex v in W. Since ^ is a winning strategy 
for the sub-game induced by W we have that player Min must choose some edge that leaves W to avoid 
losing once this strategy has been reached. □ 

D.2 Proof of Proposition [17] 

Proof. We will prove this proposition by showing that there exists some vertex w with the property 
Val CT (w) — Val CT (w) > SnareIncrease(W / ,^)- Since the procedure FixSnare switches only profitable edges 
we have by Theorem [7]that Val CT (v) — Val CT (v) > for every vertex v. Therefore, this is sufficient to prove 
the proposition because Evev( ValCr '( v ) ~ Val CT (v)) > Val CT '(w) ~ Val CT (w). 

Proposition [T6l implies that o' is consistent with the snare (W,x)- By the definition of snare consis- 
tency, this implies that br(a') must use some edge (w,x) in Esc(W). We therefore have that Val CT (w) = 
Val CT (x) +w(w). Since the FixSnare procedure switches only profitable edges, we have by Theorem [7] 
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that Val CT (x) > Val a (x). The increase at x is therefore 



Val ff (w) - Val ff (w) = Val ff (x) + w(w) - Val CT (w) 

> Val ff (x) + w{w) - Val ff (w) 

> SnareIncrease(W,x) 



□ 
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